Goto

Collaborating Authors

 exp 2


Pessimistic Risk-Aware Policy Learning in Contextual Bandits

arXiv.org Machine Learning

We study risk-aware offline policy learning, aiming to learn a decision rule from logged data that is optimal under general risk criteria. This problem is crucial in high-stakes domains where online interaction is infeasible and adverse outcomes must be carefully controlled. However, existing literature on offline contextual bandits either centers on expected-reward criteria or restricts risk considerations to policy evaluation instead of optimization. In this work, we propose a unified distributional framework for optimizing Lipschitz-continuous risk functionals, a broad class of risk measures encompassing mean-variance, entropic risk, and conditional value-at-risk, among others. By developing novel empirical concentration inequalities for importance sampling-based distributional estimators, our analysis derives data-dependent suboptimality bounds with an $\tilde{\mathcal{O}}(1/\sqrt{n})$ rate, without relying on restrictive uniform overlap assumptions. This rate is minimax optimal and matches that of risk-neutral offline policy optimization, indicating that optimizing general Lipschitz risk criteria incurs no additional statistical cost relative to the expected-reward.


Fast Rank-1 Lattice Targeted Sampling for Black-box Optimization

Neural Information Processing Systems

Black-box optimization has gained great attention for its success in recent applications. However, scaling up to high-dimensional problems with good query efficiency remains challenging. This paper proposes a novel Rank-1 Lattice Targeted Sampling (RLTS) technique to address this issue. Our RLTS benefits from random rank-1 lattice Quasi-Monte Carlo, which enables us to perform fast local exact Gaussian processes (GP) training and inference with O(nlogn)complexity w.r.t.




cdd0640218a27e9e2c0e52e324e25db0-Supplemental-Conference.pdf

Neural Information Processing Systems

The fair-ranking problem, which asks to rank a given set of items to maximize utility subject togroup fairness constraints, has received attention inthe fairness, information retrieval, and machine learning literature.




OntheConvergenceofStepDecayStep-Sizefor StochasticOptimization

Neural Information Processing Systems

Step decay step-size schedules (constant and then cut) are widely used in practice because of their excellent convergence and generalization qualities, but their theoretical properties are not yet well understood. Weprovide convergence results for step decay in the non-convexregime, ensuring that the gradient norm vanishes at an O(lnT/ T)rate.


28553688c204ddbb06a51e00684f8bb7-Supplemental-Conference.pdf

Neural Information Processing Systems

In the sequel, we empirically show the effect of different numbers of local updates on the fixed point. We consider cases withK = 1, K = 10, K = 20, K = 50. From Assumption 1, it is obvious thatgi(x,y) is convex-concave. Then, we conclude that there exists someη1 > 0 such that h(η) > 0, 0 < η < η1.